SCOPE: Scalable Composite Optimization for Learning on Spark
نویسندگان
چکیده
Many machine learning models, such as logistic regression (LR) and support vector machine (SVM), can be formulated as composite optimization problems. Recently, many distributed stochastic optimization (DSO) methods have been proposed to solve the large-scale composite optimization problems, which have shown better performance than traditional batch methods. However, most of these DSO methods might not be scalable enough. In this paper, we propose a novel DSO method, called scalable composite optimization for learning (SCOPE), and implement it on the fault-tolerant distributed platform Spark. SCOPE is both computationefficient and communication-efficient. Theoretical analysis shows that SCOPE is convergent with linear convergence rate when the objective function is strongly convex. Furthermore, empirical results on real datasets show that SCOPE can outperform other state-of-the-art distributed learning methods on Spark, including both batch learning methods and DSO methods.
منابع مشابه
An Introduction to a New Criterion Proposed for Stopping GA Optimization Process of a Laminated Composite Plate
Several traditional stopping criteria in Genetic Algorithms (GAs) are applied to the optimization process of a typical laminated composite plate. The results show that neither of the criteria of the type of statistical parameters, nor those of the kinds of theoretical models performs satisfactorily in determining the interruption point for the GA process. Here, considering the configuration of ...
متن کاملFlare: Native Compilation for Heterogeneous Workloads in Apache Spark
The need for modern data analytics to combine relational, procedural, and map-reduce-style functional processing is widely recognized. State-of-the-art systems like Spark have added SQL front-ends and relational query optimization, which promise an increase in expressiveness and performance. But how good are these extensions at extracting high performance from modern hardware platforms? While S...
متن کاملSparkBench - A Spark Performance Testing Suite
Spark has emerged as an easy to use, scalable, robust and fast system for analytics with a rapidly growing and vibrant community of users and contributors. It is multipurpose—with extensive and modular infrastructure for machine learning, graph processing, SQL, streaming, statistical processing, and more. Its rapid adoption therefore calls for a performance assessment suite that supports agile ...
متن کاملSCARFF: a Scalable Framework for Streaming Credit Card Fraud Detection with Spark
The expansion of the electronic commerce, together with an increasing confidence of customers in electronic payments, makes of fraud detection a critical factor. Detecting frauds in (nearly) real time setting demands the design and the implementation of scalable learning techniques able to ingest and analyse massive amounts of streaming data. Recent advances in analytics and the availability of...
متن کاملA scalable system for primal-dual optimization
We present some of the most widely used architectures for Big Data, Hadoop and Spark, and develop several implementations exploiting, the advantages of each. We implement a simplified version of the primal-dual optimization algorithm, described briefly in this paper, by choosing the smoothing functions to be ‖ · ‖2 with a zero center point. Under the assumption that data is provided as a sparse...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017